Showing 120 of 120on this page. Filters & sort apply to loaded results; URL updates for sharing.120 of 120 on this page
DeepSeek Coder V2 Open-Source Model Better GPT-4o | The Thought Collection
DeepSeek Coder 2 beats GPT4-Turbo open source coding model - Geeky Gadgets
Qwen2.5 Coder 32B Instruct - AI Model Details & Benchmarks
Qwen: Qwen3 Coder Flash - AI Model Details & Benchmarks
Introducing A Benchmark Model Gemini By Google - AiThority
Qwen3 Coder Performance Evaluation: A Comparative Analysis Against ...
How to Create Your Own Coding Benchmark to Test the Quality of ...
DeepSeek Coder
NVIDIA Llama Nemotron Ultra Open Model Delivers Groundbreaking ...
Musk’s xAI launches Grok 3, which it says is the ‘best AI model to date’
GitHub - grounded-coding/docground-benchmark: Benchmark with ...
BioCoder: A Benchmark for Bioinformatics Code Generation with Large ...
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Cursor introduces its coding model alongside multi-agent interface ...
Mercury Coder Small Benchmarks - RankedAGI
Qwen Researchers Introduce CodeElo: An AI Benchmark Designed to ...
Qwen 2.5 Coder 32B | Open Laboratory
DeepSeek's latest R1 model matches OpenAI's o1 in reasoning benchmarks
OpenAI releases new coding benchmark SWE-Lancer showing 3.5 Sonnet ...
Benchmark Scores = General Capability + Claudiness
Understanding Model Benchmarks in Azure AI Studio
Incoder | Large Language Model
New benchmark reveals AI coding limitations despite industry claims
WizardCoder: Why Its the Best Coding Model Out There
AI Code Generation: New DevQualityEval Benchmark Reveals Which LLMs ...
GitHub - coderhs/plot_benchmark: Visualize the results of Benchmark
Qwen3 Coder: The New AI Coding Model – Features, Benchmarks - YouTube
GPT-5.2 vs Claude Opus 4.5: The Definitive Coding Benchmark Comparison ...
Qwen3-Coder is Alibaba's most "agentic" coding model to date
Introducing Stable Code Instruct 3B: The New Benchmark in Coding ...
New QWEN 3 Coder : Did the Benchmark's Lie? - Geeky Gadgets
OpenAI o1 Benchmark and Guide: Overview of o1-preview, o1-mini, Limits ...
Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 ...
Paper page - VCode: a Multimodal Coding Benchmark with SVG as Symbolic ...
Qwen3 Coder Next - Intelligence, Performance & Price Analysis
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual ...
Qwen3-Next: Revolutionary 80B Model with Only 3B Active Parameters ...
LLM Benchmarks in 2024: Overview, Limits and Model Comparison
GPT-4o Benchmark - Detailed Comparison with Claude & Gemini | Wielded
A new promising benchmark for code generation models : r/llm_updated
OpenAI's new GPT-5 Codex model takes on Claude Code
Benchmark Framework. These models are handed over to the Benchmark ...
Introducing BigCodeBench by BigCode: The New Benchmark for Assessing ...
Claude 3.7 Sonnet vs Qwen 2.5 Coder
What Are Model Benchmarks? | Label Studio
GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities
DeepSeek-AI Introduce the DeepSeek-Coder Series: A Range of Open-Source ...
Top 10 Benchmarking PowerPoint Presentation Templates in 2026
Anthropic Unveils Claude 4 Opus and Sonnet, Claiming Seven-Hour ...
Performance Benchmarks and Metrics for Code-Generation AI: Evaluating ...
Best LLMs for coding: developer favorites
Introducing Epoch AI's AI benchmarking hub | Epoch AI
qwen2.5-coder:32b-instruct
ChatGPT and Other AI Assistants: An Ultimate Comparison | Beetroot
What are LLM Benchmarks?
Claude 3.7 Sonnet: How it Works, Use Cases & More | DataCamp
Qwen3-Coder is Finally Here and It's Breaking All the Coding Benchmarks
Top benchmarks for the best open-source coding LLMs in 2025
qwen2.5-coder:14b-instruct
15 LLM coding benchmarks
The performance of the open source coding assistance AI 'Qwen2.5-Coder ...
【科普】AI模型也有“比赛”?一文搞懂BenchMark到底是个啥!-CSDN博客
Concept of model-based coding | Download Scientific Diagram
renchris/qwen3-coder
Dream-Coder 7B | HKU NLP Group
DeepSeek-Coder-V2 Tutorial: Examples, Installation, Benchmarks | DataCamp
The most common "Benchmarks and Controls" coding references. | Download ...
Introduction to StarCoder and StarCoder 2 - DebuggerCafe
Top AI Coding Tools in 2024: An In-Depth Analysis with Real-World ...
GitHub - symflower/eval-dev-quality: DevQualityEval: An evaluation ...
Open-Source Code Language Models: DeepSeek, Qwen, and Beyond
Top LLM Benchmarks Explained: MMLU, HellaSwag, BBH, and Beyond ...
Best AI Models for Coding and SDLC in 2025, Real Benchmarks, Real Tools ...
Claude 3.5 Sonnet vs GPT-4o: OpenAI Models for Coding | Bind AI
jaahas/qwen3-abliterated
Best Local LLM for Coding A Comprehensive Guide for Developers
AI Benchmarking Dashboard | Epoch AI
The Ultimate Guide to AI Benchmarks in 2026: 10 Must-Know Tests 🤖 ...
DeepSeek-V3.2: The Open-Source LLM Challenging GPT-5 & Gemini-3.0 in AI ...
What are OpenAI o3 and o4? | Zapier
Benchmarks for Comparing Human and AI Intelligence — LessWrong
Qwen2.5-Coder 32B Instruct: A Best Coding Model-A Complete Step-by-Step ...
LLM Code Performance: Top 10 Benchmarks Explained | by Vivedha Elango ...
MLE-bench - OpenAI推出AI代理性能评估的基准测试工具 | AI工具集
A Complete Guide to Grok AI (xAI)
OpenAI's o1-preview vs o1-mini: A Step Forward to AGI
OpenAI’s new “reasoning” AI models are here: o1-preview and o1-mini ...
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond ...
Complete Guide on using DeepSeek for Coding: Quick Code Generation
Qwen2.5: A Party of Foundation Models! - Alibaba Cloud Community
Benchmarks have accelerated AI-for-coding, but there are no AI-for ...
Polycoder | Large Language Models
unsloth/Qwen3-Coder-Next · Hugging Face
GPT code editing benchmarks | aider
Enhanced Coding Capabilities: Gemini 2.5 Pro (I/O Edition) - Fusion Chat
qwen3-coder:480b
contrastive-predictive-coding/benchmark_model.py at master ...
Active Code Learning: Benchmarking Sample-Efficient Training of Code ...
“Vibe coding” surges as a quarter of Y Combinator startups now get AI ...
The Ultimate 2025 Guide to Coding LLM Benchmarks and Performance ...
Improving Your Predictive Coding Model's Performance – Knowledge Base
Alibaba Launches Qwen2.5-Coder Models, 0.5B–32B, Outperforming GPT-4o ...
DeepCoder-14B open-source coding AI matches OpenAI o3-mini performance ...
GitHub - Benchmarking-Initiative/Benchmark-Models: A collection of ODE ...
What is LLM Benchmarks? Types, Challenges & Evaluators
Claude 3: Das erste Large Language Model, das GPT-4 schlägt – oder ...
Claude 3.5 sets new AI benchmarks, beating GPT-4o in coding and ...
GitHub - wizard-III/ArcherCodeR: ArcherCodeR is an open-source ...
The AI Race Heats Up As Google And DeepSeek Unveil Competing AI ...